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1. Introduction 

Standards are central to the achievement and 
maintenance of accuracy in trace analysis. This fact 
is well-known and well-accepted in the interna- 
tional analytical chemical community, where 
"standards" are generally considered to be Stan- 
dard Reference Materials (SRMs) or Certified Refer- 
ence Materials (CRMs). The term, standards, 
however, is multivalued, as noted recently by a for- 
mer Director of the National Bureau of Standards 
[1]. That is, even in our more conventional view of 
trace analysis, we must consider in addition to stan- 
dard materials: standard procedures (protocols), 
standard data (reference data), standard units (SI), 
standard nomenclature, standard (certified) instru- 
ments, and standard tolerances (regulatory stan- 
dards, specifications, norms) [2]. It is interesting, in 
light of these several types of "standards" which 
have some bearing on accuracy in trace analysis, to 
consider the possible significance of standards in 
and for Chemometrics. 

To pursue this objective, we first must have a 
common understanding of the meaning of the term, 
chemometrics, and what significance it may have 
for accurate trace analysis. A concise definition is 
given by the subtitle of the volume which resulted 
from the First NATO Advanced Study Institute on 
Chemometrics, i.e., "Mathematics and Statistics in 
Chemistry" [3]. Implications for accuracy, espe- 
cially accuracy in trace analysis, are immediately 
evident. That is, wherever mathematical or statisti- 
cal operations contribute to the experimental de- 
sign, data evaluation, assumption testing, or quality 
control for accurate chemical analysis, "chemomet- 
ric standards" are at least implicitly relevant. 

The major part of this paper will be devoted to 
an explicit discussion of such chemometric stan- 
dards, including case studies drawn from recent re- 
search at the National Bureau of Standards. The 



discussion will be placed in the framework of the 
Analytical System, or Chemical Measurement Pro- 
cess (CMP), for such a perspective makes it possi- 
ble to consider logically a "theory of analytical 
chemistry"; and certainly chemometrics is a very 
important part of such a theory [4,5]. To set the 
stage, the next section will include a brief view of 
the current content of Chemometrics, together 
with a summary of its history and literature. This 
article will conclude with a glimpse at the future of 
chemometrics, with special emphasis on means to 
achieve increased accuracy in our chemical mea- 
surements and increased understanding of the ex- 
ternal (physical, biological, geochemical) systems 
which provide the driving forces for analytical 
chemistry. 



2. A Brief History 

The content of Chemometrics, as viewed by the 
"Working Party on Chemometrics" of the Union 
of Pure and Applied Chemistry (IUPAC), is given 
in table 1 [6]. Included in the second, major portion 
of the table are titles for some 30 chapters which 
comprise an overview document being prepared 
for IUPAC. Two points are evident from the list of 
titles: (1) the scope of chemometrics is very broad 
indeed, encompassing significant portions of ap- 
plied mathematics; (2) as implied by the name, ma- 
jor emphasis is given to measurement, specifically 
chemical measurement. In a narrower sense, 
chemometrics is sometimes viewed as the intersec- 
tion of statistics and analytical chemistry, as seen 
by the emphasis on experimental design, control, 
and the analysis of signals and analytical data. The 
several chapters on signal and data analysis include 
such topics as filtering, deconvolution, time series 
analysis, exploratory data analysis, clustering, pat- 
tern recognition, factor analysis, and (multivariate) 
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regression. Standards and analytical accuracy have 
special relevance to the chapters on terminology, 
precision and accuracy, performance characteris- 
tics, calibration, analysis, and quality control. 



Table 1. What is chemometrics? 



1. NATO Advanced Study Institute (1983) 
"Chemometrics: Mathematics and Statistics in Chemistry" 

2. IUPAC— Working Group on Chemometrics (1987) 
Scope 

Producing Chem. Information Notation & Terminology 

Precision & Accuracy: 
intralab, interlab 



Relating Chemical & 
Non-Chemical Data 

Performance Characteristics 



Calibration: univariate, 
multivariate 

Information Theory 

Optimization & Exptl. Design: 
sequential, simultaneous 

Signal Analysis: 4 chapters Data Analysis: 8 chapters 

Expert Systems: custom 
made, knowledge 
engineering tools 



Operations Research 



Graph Theory 
Robotics 



Computational Techniques 
(future strategies) 

Chemical Image Analysis Sampling Strategies 

Quality Control Systems Theory 



A brief, chronological history of chemometrics 
is presented in table 2. To convey information on 
both the history and the literature of this discipline, 
we have indicated milestones in the form of se- 
lected references, to the extent possible. Impres- 
sive, recent growth is seen by the fact that the first 
two textbooks and the first two journals, specifi- 
cally devoted to chemometrics, were published 
within the last 2 years. Looking to the beginning of 
this history (bottom of table 2), we find the name of 
Jack Youden, certainly one of the earliest and most 
notable chemometricians, whose excellent guide to 
chemometrics was published some 20 years prior to 
the invention of the term. (Youden, incidentally, 
was a proper chemometrician, in that he began his 
career as a chemist, and then went on to become a 
distinguished statistician.) The journal Analytical 



Chemistry has long served chemometrics well, 
through its biennial fundamental reviews of the 
subject, starting well before the term was known. 
As indicated in table 2, the term "chemometrics" 
was conceived by Svante Wold, in January 1971. 
The reader's attention is called to the interesting 
paragraph by Wold, in reference [7], which details 
the beginnings of chemometrics, including the start 
of the Chemometrics Society by Wold and Kowal- 
ski, in Seattle on 10 June 1974. The intervening 
decade, culminating in the forementioned NATO 
Advanced Study Institute, saw rapid growth in 
chemometrics education and research, much of it 
promulgated by the Chemometrics Society and 
published in journals such as Analytical Chemistry 
and Analytical Chimica Acta. Also, there appeared 
several notable texts which were largely chemo- 
metric in content, if not in title [8-13]. 

Table 2. A brief history 

IUPAC (1987): Report on Chemometrics (D.L. 

Massart, M. Otto) 

Two textbooks: "Chemometrics: a textbook" 

(1987, Elsevier) 

(Massart, Vandeginste, Deming, 
Michotte, Kaufman) 

"Chemometrics" (1986, Wiley) 
(Sharaf, Illman, Kowalski) 

Two Journals: Journal of Chemometrics (Jan. 

1987) (Ed. Kowalski, Wiley) 

Chemometrics and Intelligent 
Laboratory Systems (Nov. 1986) 
(Ed. Massart; Elsevier) 

Chemometrics Conference: (NBS, May 1985) — dedicated to 

W. J. Youden (Spiegleman, 
Sacks, Watters; NBS J. 
Research 90 [6]) 

NATO Advanced Study Institute on Chemometrics: 
(Cosenza, Sept. 1983) 
(Kowalski) 

"Chemometrics: Theory and Application" (1977) 

(Ed. Kowalski; ACS Sympos 
52) 

Chemometrics Society founded (Seattle, 1974) (S. Wold, 

B. Kowalski) 

CONCEPTION— S. Wold (1971) (J. Chemometrics, V.I, No. 
1, p. 1, Jan. 1987) 

Analytical Chemistry (ACS), 

Reviews on statistics . . . mathematics . . . chemometrics (even 
years) 

W. J. Youden, "Statistical Methods for Chemists" (1951, Wiley) 
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To complete this brief look at the content, his- 
tory and literature of chemometrics, it is fitting to 
refer to the Chemometrics Conference held at NBS 
just 3 years ago. It was a special meeting in many 
respects, for it epitomized the interdisciplinary na- 
ture and increasing scope of chemometrics; and it 
was "probably the first (such meeting) in the 
United States by that title" [14]. The meeting was 
jointly planned by an interdisciplinary team, con- 
sisting of a chemist and two statisticians. It was 
jointly sponsored by two national chemical and 
two national mathematical societies. Finally, it con- 
tained an extremely effective and balanced blend of 
experts from the two disciplines: mathematicians 
(and statisticians) providing critiques of chemomet- 
rics presentations by chemists, and chemists 
providing critiques of the presentations by mathe- 
maticians. The synergism resulting from this ap- 
proach is evident from examining the proceedings 
[14]. It is appropriate to conclude with reference to 
this volume, for it was dedicated to W. J. Youden, 
our first chemoraetrician in table 2. 

3. Chemometric Standards and 

the Analytical System 
3.1 Standards 

The agenda for chemometrics, from the perspec- 
tive of standards, is outlined in table 3. First, we 
must deal with the issue of nomenclature. Because 
of the relatively recent formal emergence of 
chemometrics, and because of its interdisciplinary 
character, this is a very important matter for our 
early attention. Nomenclature, in this context, 
refers to much more than terminology. That is, it 
includes basic meaning and explicit formulation of 
concepts falling within the scope of mathematics 
and chemistry. The efforts of IUPAC, both in the 
Commission on Analytical Nomenclature [15] and 
as outlined in table 1 [6] , will be extremely helpful 
in this fundamental task for chemometrics — to as- 
sure that chemists and mathematicians "speak the 
same language" where that language maintains as 
much self consistency as possible with the slightly 
diverse languages of the separate disciplines. (To 
some extent, we shall have to accept a bilingual 
dictionary. For example, "efficiency," "consis- 
tency," and "sample," have somewhat different im- 
plications in statistics and analytical chemistry.) 



Table 3. Chemometric standards 



Nomenclature (terminology, concepts, formulation) 

Standards for accuracy (entire chemical measurement process) 

detection, identification, estimation, uncertainties, 
assumptions 

evaluation of chemometric techniques, software, algorithms 

validation through "standard" data; interlaboratory exercises 

design to meet external needs for adequate, accurate chemical 
information 

Advance the state of the art; stimulate multidisciplinary 
cooperation 



Supporting standards for accuracy, for the entire 
Chemical Measurement Process, is perhaps our 
most important task. The primary components are 
indicated under the second heading in table 3. Most 
important is a rigorous approach to the specifica- 
tion and evaluation of the fundamental characteris- 
tics of analytical methods and analytical results, 
such as detection, identification, and quantification 
(estimates and uncertainties). A combination of 
chemical knowledge (or "chemical intuition") and 
statistical expertise in this effort is the best means to 
assure validity and control through the specifica- 
tion and testing of assumptions. A second level of 
control which represents a special responsibility 
for chemometrics is the production and evaluation 
of quality software and algorithms — a responsibil- 
ity which is being met in both chemometrics jour- 
nals. The logical extension of chemical software 
standards is found in chemometric validation, or 
Standard Test Data (STD), designed to guarantee 
quality for the Evaluation step of the CMP. STD 
thus parallel SRMs for accuracy assurance in both 
intra- and interlaboratory environments [16]. It is 
worth emphasizing that with the enormous pro- 
gress in laboratory automation, and the substitution 
of machine intelligence for human intelligence, 
quality control of the mathematical or chemomet- 
ric phase of the CMP becomes ever more urgent. 
Direct instrument responses are increasingly un- 
available for the expert scrutiny of the analyst, and 
automatic results are produced with little indica- 
tion of the assumptions involved or numerical 
validity (and robustness to outliers) of the compu- 
tational methods. 
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The last "standard" indicated in table 3 relates to 
design. Design of the sampling, measurement, and 
data evaluation steps of the CMP to meet specified 
needs, is really the first responsibility of chemomet- 
rics. A careful blend of statistical expertise and 
chemical knowledge once again is the best means 
for meeting the accuracy or information require- 
ments of the CMP. Inadequate attention to design 
is perhaps the most serious fault in ordinary chemi- 
cal analysis. Either inconclusive or inadequate 
chemical results are obtained, using the samples 
and methods at hand, or costs are needlessly high 
in obtaining the relevant information. This area 
constitutes one of the greatest opportunities for 
chemometrics for attaining requisite accuracy at 
minimal cost; appropriate methods include infor- 
mation and decision theory, statistical design and 
optimization techniques, and exploratory multivari- 
ate approaches such as pattern recognition and 
cluster analysis [3]. 

3.2 The Analytical System 

A "systems perspective" for the CMP has been 
promulgated by a number of eminent analytical 
chemists over the past 2 decades. One of the earli- 
est and most noteworthy efforts was made by the 
Arbeitskreis "Automation in der Analyse" beginning 
in the early 70s [4]. The systems and information 
theoretic view, which was pioneered by members 
of this circle, such as Gottschalk, Kaiser, and 
Malissa, is even more relevant today, and it offers 
perhaps the best model for an integrated chemo- 
metric approach to accuracy. Considering a simpli- 
fied representation of the CMP or analytical system 
presented for this purpose in reference [16] (fig. 2), 
for example, it is clear that not only is there mate- 
rial flow through the system, in terms of sampling, 
sample preparation, and measurement, but there is 
also the flow of information, and unfortunately 
noise. Treating the CMP as an integrated system is 
essential for the optimal application (cost vs accu- 
racy) of chemometric tools for design, control, cal- 
ibration, and evaluation. Interfaces between the 
several steps of the CMP must be astutely matched 
to prevent information loss, and data evaluation 
and reporting techniques must be recognized as 
part of the overall measurement process, capable of 
preserving or distorting information just like the 
chemical and instrumental steps. The CMP or ana- 
lytical system model can be especially helpful in 
planning for accuracy through appropriate points 
of introduction of SRMs and STD, and for explicit 



treatment of feedback, where initial results are uti- 
lized for improved, on-line redesign ("learning") of 
the CMP. 

Extended discussion of the analytical system is 
beyond the scope of this paper, but its introduction 
is essential for a meaningful consideration of the 
relationship of chemometrics to accuracy and stan- 
dards, as indicated above. The system view is obvi- 
ously important for designing or investigating 
overall performance characteristics, such as the 
blank, recovery, specificity, and systematic and 
random error — through propagation techniques 
[17]. That is, if one wishes to achieve an overall 
precision, or detection limit, or identification capa- 
bility, then the design of an optimal system must 
take into consideration the corresponding parame- 
ters for each step of the CMP, from sampling 
through data evaluation. Such an integrated ap- 
proach to design, with the help of chemometric tech- 
niques, is as relevant to the design of self- 
contained automated and intelligent analytical instru- 
ments, as it is to the design of an integrated analytical 
approach of an entire organization (such as CAC) to a 
broader analytical question, such as the selection and 
certification of Standard Reference Materials [4,12]. 

3.3 Hypothesis Testing and the CMP 

Fundamental questions to be addressed by mea- 
surement science can often be posed as hypotheses, 
to be tested or evaluated via analytical measure- 
ments. The formation of meaningful hypotheses or 
models of the external (environmental, biological) 
system is the business of expert scientists within 
that discipline. The testing of such hypotheses, 
through analytical measurements, is the business of 
expert analytical chemists. From this perspective it 
is clear that hypothesis testing captures the essence 
of the scientific method. It must therefore be a key 
feature of any "theory of Analytical Chemistry." 
This is especially important for chemometrics, for 
hypothesis testing forms one of the cornerstones of 
modern statistics. By capitalizing on the elegant 
statistical tools that have been developed for agri- 
cultural or biological testing, for example, we can 
generate an objective and optimal approach to the 
design of the CMP. That is, by combining excellent 
knowledge of chemistry with that of modern statis- 
tics, we can construct CMPs which are guaranteed 
to have sufficient (statistical) power to meet the 
specified analytical needs. In this respect, we shall 
be responding to a famous challenge by Kaiser [18], 
that analytical chemists learn to match optimally 
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the "space of analytical methods" with the "space 
of analytical problems." 

Some of the ways in which hypothesis testing 
impacts analytical accuracy, hi terms of the funda- 
mental parameters of analytical chemistry, are pre- 
sented in table 4. Figures 1 and 2 convey the 
elements of this theory together with its applica- 
tion to detection and univariate identification, re- 
spectively [19]. Further details cannot be presented 
here, but it should be noted that accuracy in trace 
analysis demands quantitative chemometric ap- 
proaches to detection, identification, and quantifi- 
cation (uncertainty evaluation), plus model and 
assumption validation. Inadequate attention to this 
matter, and imperfect understanding of the funda- 
mental (a, j8 errors) limitations of hypothesis test- 
ing, i.e., chemical measurement, continue to 
produce very erroneous conclusions regarding the 
results or power of our analytical techniques [19]. 
It is especially interesting and important to con- 
sider this in terms of the final, data evaluation step 
of the CMP, in view of the expanding use of "intel- 
ligent" and automated instrumentation, which gen- 
erally includes "black box" data evaluation. 
Monitoring the accuracy of such internal al- 
gorithms is clearly one of the critical tasks of 
chemometrics in the near future, one for which 
Standard Test Data (STD) may play an important 
role. The need is exhibited in figure 3, where per- 
fectly visible gamma ray peaks remain "unde- 
tected" by a widely used instrumental gamma ray 
analytical system [20]. 

Table 4. Analytical accuracy and hypothesis testing 3 



Hypothesis formation (external system model) 

Design of the measurement process — external (x, I, t) 

—internal (MP, EP) 
Hypotheses to be tested: 

model (simplest internal; y =B +Ax +e y ) 

detection, discrimination (estimation) 

no. of components (knowledge, "fit," constraints) 

identification (informing variable; pattern) 

error structure (stationary, white, cdf, variance, bias) 

Some diagnostics— z, t, t', K-S, x\ X 1 ' ?< residual patterns, . . . 

'Symbol explanation: x, /, r=sampling species, location(s), 
thne(s), MP, EP = measurement and evaluation steps of the 
CMP, f', x 3 ' = noiicentral f and x 1 statistics; K-S = Kolmogoro v- 
Smirnov statistic, 



Before leaving this survey of fundamentals, we 
must emphasize the importance of the first syllable. 
Chemometrics differs from statistics and mathe- 



matics in that chemical intuition or expertise forms 
an essential part of the activity. As mentioned 
above, hypothesis formation, which is necessarily 
the first step in designing a scientific experiment, 
requires disciplinary expertise. Accuracy in data 
evaluation or experiment control, for example, can 
only be expected when the chemometric tech- 
niques employed recognize the range of possible 
alternative hypotheses (models or assumptions), 
This is the crux of setting reliable bounds for sys- 
tematic error, or in establishing "definitive" analyt- 
ical methods. Empirical rules or heuristic 
techniques adapted to this purpose should be 
viewed with some caution. Examples of problems 
demanding chemical expertise for alternative hy- 
potheses are identification, and the assessment of 
blank and matrix effects [17, 19 (ch. 16)]. In figure 
2, for example, knowledge of the alternative was 
essential to compute the identification power of the 
test. In the more general case, where chemical spe- 
cies are identified on the basis of spectral or chro- 
matographic patterns, we must know the locations 
and uncertainty characteristics of a!! "nearby" pat- 
terns to assess the identification power for a given 
null pattern, or to design a measurement process 
meeting prescribed identification capabilities. In 
moving from the universe of all possible neighbor- 
ing spectral patterns, to the universe of possible in- 
terferences [21] or calibration models, for example, 
chemometrics faces a considerable challenge. 



4. Selected Illustrations 

To illustrate the relevance of chemometrics to 
the assurance of accuracy in trace analysis, we shall 
examine three recent and continuing investigations 
from our laboratory. The first has been selected as 
an example where quantitative hypothesis testing 
techniques have been applied to one of the funda- 
mental elements of any analytical system: the 
noise. The second relates to an exploratory re- 
search study which seeks to relate patterns of laser 
microprobe mass spectra to sources of combustion 
particles ("soot") in the atmosphere. It illustrates 
the importance of chemical information (or "intu- 
ition") to maintain accuracy in the application of 
multivariate data analytical techniques. The third 
illustration speaks to the need for STD, both for 
monitoring accuracy in complex chemical data 
evaluation, and as a stimulus for research for im- 
proved chemometric techniques and understanding 
of the data evaluation process. 
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4.1 Counting Statistics — Are They Poisson? 

The two fundamental model characteristics of 
analytical signals are the functional relation, con- 
necting the expected value of the signal to the ana- 
lyte concentration, and the error structure, as 
indicated in table 4. Accurate measurements and 
accurate assessment of method performance char- 
acteristics demand knowledge of both. In this sec- 
tion we describe an experiment designed to 
investigate the statistical properties and the causal 
characteristics of the noise component in counting 
experiments. Such experiments, where individual 
atoms, ions, or photons are counted, comprise 
some of the most sensitive in analytical measure- 
ment. In many such cases it is assumed that the 
limiting counting noise is Poisson in nature. Since 
the variance of the Poisson distribution is equal to 
the mean, such an assumption leads to a simple er- 
ror (standard deviation) estimate, and error propa- 
gation techniques may then be used for estimating 
uncertainties for net signals and analyte concentra- 
tions. 

The primary objective of our investigation of 
noise was to test the validity of the Poisson hypoth- 
esis for very low-level counting data, with special 
emphasis on background counts. The validity of 
the Poisson assumption has long been one of the 
more intriguing questions in nuclear physics and 
chemistry, and it has therefore been the subject of 
some notable experiments [22]. Our experimental 
system was uniquely designed to permit a much 
more stringent test of this hypothesis, as it pro- 
vided individual arrival times for more than a mil- 
lion events. A second objective, if the Poisson 
assumption proves valid, is to provide a physical 
random number generator — a device operating by 
the laws of physics, to generate random numbers 
for use in numerical simulations, as an alternative 
to numerical pseudo-random numbers. 

A practical objective for investigating the low- 
level counting noise distribution derives from our 
physical knowledge of the measurement system, 
i.e., our knowledge of potential alternative hy- 
potheses. Perhaps the most important such alterna- 
tive is the possibility of correlated events in the 
radiation detector, which could have a profound 
influence on the magnitude and variability of our 
background noise. As indicated in figure 4, the ef- 
fective background is reduced by about a factor of 
100 through anticoincidence shielding. If, due to 
wall or gas impurity effects, just 1% of the elec- 
tronically canceled events were to produce a sec- 



ondary, time delayed event in the central detector, 
the effective background would be doubled! Time 
series and distributional analysis of the background 
noise thus allows us to investigate this alternative 
process. Knowledge of the statistical power of the 
null (Poisson) hypothesis test against this particular 
alternative is therefore vital both for the construc- 
tion of valid uncertainty intervals, and for under- 
standing the basic physics and chemistry of the 
background events. One illustration of the distribu- 
tional analysis is given in figure 5, where x 2 is used 
to test deviations from the expected exponential 
distribution of time intervals between events. Fur- 
ther discussion of this investigation, including a 
tabulation of six alternative hypotheses, is given in 
reference [23]. Further investigation of sources of 
background noise is currently underway, using 
multivariate exploration of pulse shape characteris- 
tics. 



4.2 Multivariate Exploratory Analysis: 
Origins of Atmospheric Soot Particles 

Perhaps the best known applications of chemo- 
metrics involve multivariate techniques such as 
principal component analysis (PCA) and cluster 
analysis. Such techniques have reached a high de- 
gree of sophistication, as exploratory tools for the 
classification of samples which may be character- 
ized by multi variable patterns or "spectra." An ex- 
cellent introduction to the principles and methods 
of the "soft" or empirical multivariate modeling 
techniques is given in reference [24]. PCA and re- 
lated techniques are especially useful for data ex- 
ploration, in that they permit ready visualization of 
sample relationships, provided there are not too 
many independent components in the system under 
investigation. Thus, a collection of mixtures of two 
components having quite complex, yet different, 
spectra or chemical patterns, can be represented by 
a set of points in a plane, or on a line if the mixtures 
are normalized. If the pure components are repre- 
sented, they appear as the end points. Two dimen- 
sional PCA plots thus allow us to display relations 
among mixtures of three normalized components; 
and three dimensions increases the display capabil- 
ity to four components. Beyond exploratory dis- 
play capability, several methods of multivariate 
chemical analysis may be employed for quantita- 
tive estimates for the number and identity of com- 
ponents, and for the analysis of mixtures [25]. 
These are outgrowths of the seminal work of Law- 
ton and Sylvestre [26], 



198 



Volume 93, Number 3, May-June 1988 

Journal of Research of the National Bureau of Standards 



Accuracy in Trace Analysis 



The interplay between the multivariate display 
techniques and chemical "intuition" (experience, 
knowledge) is exhibited in our investigation of 
laser microprobe mass spectra (LAMMS) of indi- 
vidual soot particles formed from the combustion 
of wood and fossil fuel. The scientific basis for our 
interest in this problem derives from the potential 
health effects of combustion particles, which often 
carry mutagens, on the one hand, and the geo- 
chemical and climatic implications, on the other. 
The ability to infer combustion sources for individ- 
ual soot particles could add greatly to our under- 
standing of climatic perturbations and perhaps 
even such phenomena as the Tertiary-Cretaceous 
Extinction [27]. PCA data exploration was attrac- 
tive for this study because the system was rela- 
tively simple in terms of intrinsic structure (two 
components), but relatively complex in terms of 
both the graphitic soot formation and laser plume 
ion formation processes. The work demonstrates 
an extremely important point with respect to accu- 
racy, however. That is, the importance of having 
thoroughly reliable chemical information for vali- 
dation of the exploratory techniqres. This is shown 
in figure 6. The upper part of the figure shows the 
successful classification of wood vs hydrocarbon 
fuel soot particles on the basis of their positive ion 
laser microprobe mass spectra. Application of this 
model, which was developed for laboratory-gener- 
ated particles, to soot particles collected in the Field 
(urban atmosphere), however, would lead to erro- 
neous conclusions (misclassification). The failed 
classification shown in the lower part of the figure 
was discovered through the use of an independent 
tracer of known accuracy, 14 C, for source discrimi- 
nation [28], Subsequent research on this very im- 
portant basic and practical problem has led to some 
understanding of the reason for the difference be- 
tween laboratory and field particles, a basic issue 
being sensitivity of certain species (features) to de- 
viations from the two-source, linear model. This 
example illustrates one of the more important cau- 
tions in the use of multivariate techniques, such as 
PCA and factor (FA) analysis: namely, the influ- 
ential character of outliers and departures from as- 
sumptions. Further investigation of the 
atmospheric particles has shown the utility and rel- 
ative robustness of selected negative ion carbon 
clusters for combustion source discrimination, as 
shown in figure 7. Unlike PCA and FA approaches 
to exploratory multivariate data analysis, the coor- 
dinates of the "bi-plot" of figure 7 are not per- 



turbed by outliers. Also, they are often more readily 
interpretable chemically than eigenvectors, though 
clearly they do not possess the dimension reduction 
efficiency of PCA. 

4.3 Standard Test Data 

A special task for chemometrics is guaranteeing 
the accuracy of the data evaluation phase of the 
chemical measurement process. An important ele- 
ment in the task is the development of representa- 
tive, reference data sets having known 
characteristics, for testing the validity of data eval- 
uation. Such "standard test data" (STD) thus play 
the same role for data evaluation that SRMs do for 
procedure evaluation. STD are likely to become 
increasingly important as the data evaluation step 
becomes more complex, and as it becomes less ac- 
cessible to the user, as in automated analytical sys- 
tems. The nature and importance of STD for 
assessing interlaboratory precision and accuracy 
have been well demonstrated by exercises based on 
univariate gamma ray spectral data created by the 
International Atomic Emergy Agency (IAEA) [29] 
and multivariate atmospheric data created by NBS 
[30]. The parallelism with SRMs has been further 
established for the former STD through incorpora- 
tion into the catalog of the IAEA's Analytical 
Quality Control Service Program [31]. A brief de- 
scription of the objectives and outcome of the mul- 
tivariate STD exercise follows. (A more extended 
review of both exercises may be found in reference 
[16].) 

The objective of the multivariate STD exercise 
was to evaluate the resolving power, and precision 
and accuracy of all major mathematical techniques 
employed for aerosol source apportionment, based 
on linear models incorporating chemical "finger- 
prints" or spectra. To adequately test these tech- 
niques, which comprised various forms of 
multivariate factor or regression analysis, it was 
necessary to generate data matrices which were re- 
alistic simulations of the variations in source mixes 
found in an urban airshed. Also important was a 
realistic injection of random errors characterizing 
pure source profiles as well as "measured" ambient 
samples. This was accomplished by means of the 
linear equation given below, where the S jt were 
generated by applying a dispersion model incorpo- 
rating real meteorological data to two urban (geo- 
graphic) models. The STD generation scheme is 
illustrated for one of these urban models in figure 8. 
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Generating Equation 

p 
C u = 2 [A —e m —e H \i J Sj l + e il , 

where: _/ = sampling period [l<r<40] 

C ,-, — "observed" concentration of 

species — /, period — t 

[l<i<N, iV<20] 
Sji = true intensity (at receptor) 

of source— /' [1 ^/<J°, P< 13] 
A ij = "observed" source profile matrix 

(element — i,j) 
e, = random measurement errors, 

independent and normally 

distributed 
e m = systematic source profile errors, 

independent and normally 

distributed (systematic, because 

fixed over the 40 sampling 

periods) 
e H = random source profile variation 

errors, independent and 

log-normally distributed 

The outcome of the exercise was instructive. 
Though results for the several techniques were 
generally correlated, and agreement with the 
"truth" was generally within a factor of two, some 
important differences and discrepancies were ob- 
served. For example, FA methods in contrast to 
weighted least squares estimation ("chemical mass 
balance") could not provide estimates for all com- 
ponents. They were limited to a collection of four 
or five component classes. Also, presumably identi- 
cal methods, operating on strictly identical data re- 
sulted in differing component estimates as well as 
different standard error (SE) estimates. Comparing 
the actual distributions of residuals to the quoted 
SEs, we found the latter to vary from gross under- 
estimates to gross overestimates. It was clear from 
this exercise that results depended heavily on "op- 
erator judgment," i.e., unique solutions could not 
be obtained without the use of certain, often im- 
plicit assumptions or decisions. It can be shown 
that problems of this sort, and in fact a large frac- 
tion of the multivariate problems in chemistry, are 
underdetermined or heavily dependent on assump- 
tions. This is a challenge to chemometrics. Chemi- 
cal knowledge combined with astute design should 
eliminate some of the inaccuracy connected with 
model selection, error treatment, and incautious 
use of criteria such as non-negativity. 

Just as with SRMs, the above intercomparison 
was not the last word with this data set. Rather it 



has served as a test bed for additional and newly- 
developed methods of multivariate chemical data 
analysis [16], the most recent of which involves a 
new, more accurate representation of multivariate 
data by "parallel coordinate" systems [33]. In the 
future, we would expect STD to continue to serve 
the mutiple purposes of chemometric quality con- 
trol for both conventional and automated analyti- 
cal systems, assessment of interlaboratory or 
interalgorithmic accuracy, and as stimuli for 
chemometric research on complex, multicompo- 
nent systems. 

5. Summary and Forecast 

In conclusion, let us consider for a moment the 
matter of forecast, as viewed from two perspec- 
tives: (1) What may be forecast for the future of 
chemometrics in relation to standards and accu- 
racy? (2) What directions are envisioned if we are 
to use chemometrics to improve our ability to un- 
derstand and forecast the behavior of external sys- 
tems, such as the environment? Key issues which 
comprise the answer to the first question are: 

• Nomenclature, including rigorous terminol- 
ogy and formulation of the performance character- 
istics of the CMP, plus standard nomenclature for 
methods of CMP design, control and evaluation 
derived from applied mathematics. 

• Optimal design of the overall analytical system 
to meet prescribed analytical needs and accuracy 
limits, utilizing detailed chemical knowledge of the 
characteristics of the individual CMP steps. 

• Attention to the validity of the analytical 
model, both the functional relationship and the 
noise models; specification of hypotheses and tests 
having adequate power with respect chemically 
significant alternative hypotheses. 

• Assessment of the accuracy of mathematical 
techniques as applied to chemical data, via al- 
gorithm or software evaluation, or overall data re- 
duction evaluation using STD. 

• Development of new methods of increased ac- 
curacy by iteratively linking CMP design, chemi- 
cal separation, instrumental measurement and data 
evaluation, to reduce dependence on unverified as- 
sumptions, and to improve precision through inter- 
ference reduction and application of expert 
knowledge. 

The second question relates to the fact that data- 
based, empirical models cannot be relied upon to 
provide information beyond their immediate do- 
main. That is, if we wish to be in a position to make 
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accurate forecasts, or even accurate interpolations, 
for a given system, there is no substitute for a de- 
tailed mechanistic understanding of the properties 
(model) of that system. It is in this area that chemo- 
metrics, and analytical chemistry, have their great- 
est promise for the future. This prospect is best 
viewed in terms of a pair of interacting systems. 
The first system represents the raison d'etre or driv- 
ing force for analytical chemistry; it is the external 
system which depends on chemical analyses for its 
elucidation or control. The second system is the 
analytical system or CMP. Chemometrics has long 
recognized the linkage between these two systems, 
but much of the work has been based on sampling 
and measurements designed to establish empirical 
patterns, or "soft modeling" [34]. 

Soft modeling, which might be viewed as an out- 
growth of empirical, statistical modeling, is ex- 
tremely important for exploratory studies, and for 
providing statistical descriptions of empirical rela- 
tionships in complex chemical or biological sys- 
tems. In contrast, "hard global models . . . have 
great advantages both in their far-reaching predic- 
tions and their interpretation in terms of fundamen- 
tal quantities." And, unlike soft models, "the 
deviation between the hard model and the mea- 
sured data must not be larger than the errors of 
measurement" (Wold and Sjostrom, pp. 243ff, 
[34]). Increased movement in chemometrics to- 
ward hard modeling is clearly attractive because of 
the potential for increased basic understanding and 
increased accuracy; it is realistic in view of the 
enormous advances during the last decade In sam- 
pling and measurement capabilities, and especially 
in computational capacity. 

The transition toward more accurate representa- 
tion of the external physical, chemical or biological 
systems which analytical chemistry must serve is 
outlined in table 5. To complement Wold's basic 
categories, we present the "musical" classification 
of Douglas Hofstadter [35], and the mechanistic 
model categories often used to describe biological 
or environmental systems [36]. Hofstadter's de- 
scriptors are apt They convey succinctly the in- 
creasing sophistication of models ("analogies") in 
an area of enormous intrinsic complexity — artificial 
intelligence. The flow of models for the environ- 
mental system brings us immediately back to ana- 
lytical chemistry and chemometrics. That is, the 
linear model, such as that described in section 4.3 is 
our simplest representation for an environmental 
system. Consistency and accuracy, governed by 
measurement error alone, cannot be generally ex- 



pected with so simple a model. Improvements may 
be gained through: (1) combined chemometric 
techniques, such factor analysis followed by time 
series analysis, to explore the dynamics of the sys- 
tem [37]; and (2) "hybrid" modeling to take into 
account certain non-linearities such as homoge- 
neous and heterogeneous reactions [38], Major pro- 
gress in understanding and monitoring an 
environmental system comes when natural "com- 
partments" may be defined, with differential equa- 
tions describing transfers between compartments 
[39]. When the compartmental description is inade- 
quate, one must consider an even more detailed de- 
scription of the system, generally by taking into 
consideration its full dynamic space-time character 
through the use of coupled equations representing 
transport and reaction [40]. These last two cate- 
gories of modeling and measurement are important 
for assessing the potential impact of human activi- 
ties on climate, in connection with the "COj" prob- 
lem, and the coupled reactive system COOH-CH 4 , 
respectively [41]. 

Table 5. The transition from empirical to mechanistic modeling 



S. Wold a 



"Soft" 



"Hard" 



Model classes 
[empirical — > mechanistic] 
D. Hofstadter b Environ, system 



"Muzak" 



Jazz" 



Linear 



- hybcid d 
Compartment* 



"Classical Music" 



Full dynamic r 



a See reference [34], 

b See reference [351. 

'Multivariate source apportionment (conservative tracers) [32]. 

d Particle— sulfate system apportionment [37,38], 

c COi system: ttopospliere-biosphere-ocean; biological systems 

[36,39]. 

'CO-OH-CHj system (production, transport, reaction) [40,41]. 

We face very important opportunities to gain in- 
creased fundamental knowledge of the nature 
(mechanistic models) and state of external (envi- 
ronmental, biological) systems through the use of 
hard, or at least harder, models to guide the sam- 
pling and measurement designs for these systems. 
By working closely with expert theoretical geo- 
chemists or biochemists, for example, chemometri- 
cians have the opportunity to design the analytical 
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measurement process to optimally test alternative 
external models, to better estimate their parame- 
ters, and to more accurately evaluate their present 
state and future course [42]. 
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A A -B < SI ' m 9'9> 



Figure 2. Hypothesis testing formulation for identification in an- 
alytical chemistry. Probability density functions are given for 
the difference in composition (Si) for particles emanating from 
the same source vs two different sources [19]. 
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Figure 3. Clearly visible gamma ray peaks ( 2W Hg, "Cr), which 
were not detected above a "'Co background in the IAEA practi- 
cal examination of commercial software [20], 



Figure 1. Hypothesis testing and societally important detection 
decisions. Sets of null (Ho) and alternate (H A ) hypotheses are 
listed below a Truth Table and stylized probability density func- 
tions [19]. 
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Figure 4. Low-level counting system. Penetrating cosmic rays (mu mesons) are removed as a back- 
ground component of the sample detector by coincidence with the guard ring detector, reducing 
background by two orders of magnitude. (Uncertainties shown correspond to the Poisson standard 
deviations for a 24 hour counting period.) 
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Figure 5. Chi-square test of the empirical equal probability his- 
togram for low-level counting data [23]. 
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Figure 6. Isometric PCA projections of Lab and Ambient parti- 
cle LAMMS positive ion spectra on the first three eigenvectors. 
Soot particles from wood are denoted "W" and "C"; those from 
hydrocarbon fuel are denoted "H" and "A." Feature (mass) se- 
lection on the basis of "characteristicity" preceded the principal 
component analysis [28]. 
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Figure 7. Bi-plot showing negative ion carbon cluster discrimi- 
nation of LAMMS spectra from ambient atmospheric soot 
formed from the combustion of hydrocarbon fuel ["A"] and 
wood ["C"]. 
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Figure 8. Source apportionment STD. Aj 2 represents the source profile vector for 
source-2 (incinerator); S 2t represents the source intensity time series for the same 
source. The lower portion of the figure shows the aerosol source emission map [16]. 
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